# Vision Transformer Architecture
Sapiens Seg 0.6b
Sapiens is a family of Vision Transformer models pre-trained on 300 million 1024x1024 resolution human images, focusing on human-centric vision tasks.
Image Segmentation English
S
facebook
16
0
Best Model ViTB16 GPT2
A cross-modal model based on Vision Transformer (ViT) and GPT-2, capable of generating natural language descriptions for input images
Image-to-Text
Transformers Supports Multiple Languages

B
evlinzxxx
15
0
Dog Breeds Multiclass Image Classification With Vit
MIT
A dog breed classification model fine-tuned using Google's Vision Transformer architecture, supporting image recognition of 120 dog breeds
Image Classification
Transformers

D
wesleyacheng
584
4
Big Cat Classifier
An image classifier based on Vision Transformers that accurately identifies five species of big cats.
Image Classification
Transformers

B
smaranjitghose
93
1
Featured Recommended AI Models